专利摘要:
The present disclosure references systems and methods and products of computer programs for processing data from hierarchical documents (for example, XML and JSON documents) and storing them in relational database structures. The methods presented in this document read the data from the input documents and analyze the schema of the document to create both the dynamic tables of data as well as the metadata tables for the storage of the target tabular structure. After this, the key-value data stored in the document is extracted, transformed, and mapped to a generated table structure which is referenced in the metadata tables. In this way, the information can be stored in databases or in tabular or relational structures and reconstructed in the original document if necessary. (Machine-translation by Google Translate, not legally binding)
公开号:ES2668723A1
申请号:ES201631481
申请日:2016-11-18
公开日:2018-05-21
发明作者:Álvaro HERNÁNDEZ TORTOSA;Yeray DARIAS CAMACHO;Matteo Melli;Gonzalo ORTIZ JAUREGUIZAR;Jerónimo LÓPEZ BEZANILLA
申请人:8kdata Tech S L;8kdata Technology S L;
IPC主号:
专利说明:

MACHINE FOR EXTRACTION, MAPPING AND STORAGE OF HIERARCHICAL DATA
The invention described herein is related in general to systems and methods for storing, transforming and extracting data, from or to data sources, which use, at least in part, tabular structures. In particular, the invention relates to systems and methods for storing, transforming and extracting data from documents stored in data sources.
10 structured and unstructured.
Background of the invention
NoSQL databases, data processing and storage systems
15 semi-structured, and other software called schema-less (without schema), accept key-value data structures, or "documents", as input. These documents are a convenient way to represent data of a hierarchical nature, in which there is little or no restriction on the content or structure (scheme) of the data. In particular, the structure, contents and keys of different documents, even if they are logically grouped,
20 may be completely different from each other.
The structure of a document can include a set of key-value pairs, in which the key is a name (for example, a text string) and the value can be a scalar (for example, numbers, text, booleans, values gaps, etc.) or a compound value. Composite values 25 include nested values, such as an embedded document or a collection of other scalars or composite values. For some particular software processes, the key-value pairs may be in the form of a single and indivisible stored unit of nested key-value structures. The data (for example, the value) contained in the document is typically accessed through one or more keys (although a key is not necessarily required to access
30 to the data), which can be keys generated or keys extracted and / or copied from the original document and that uniquely identify the document.


Conventional access to document data is made by accessing (for example, consulting) the data through an (external) index that indexes the primary key or other field
or document fields. However, conventional access to document data is not always desirable, optimal, or possible in certain data access scenarios, such as in 5 non-indexed queries, aggregate queries or data in nested fields or nested documents within data. In these situations, the methods implemented for access to document data use computer resources inefficiently (for example, long CPU times and high memory usage). To be more explicit, a query not indexed to a collection of documents requires a total read operation of the
10 collection The total reading operations of the collection require the analysis of all data document by document, key by key, applying the query predicate for each document. This operation includes the technical disadvantages of the use of long processing times, the creation of frequent bottlenecks in the I / O CPU, and exposes a bad pattern of use of the cache.
15 In addition to the above, current systems encounter difficulties when processing documents that have an inefficient scheme or are schema-less (without scheme), due to the fact that the structure used must be defined for each document. If the documents within a given set share the same underlying structure or
20 similar enough, a significant computational overexertion is incurred due to the unnecessary repetition of the scheme, which leads to an incorrect use of space, memory and processing.
Therefore, there is a need for systems and methods to extract, transform and store the
25 document data from a data source in which the documents have a variable scheme or are schema-less (without scheme), to improve the management of computational resources. In addition, there is a need for systems and methods that can efficiently process nested key-value hierarchical data.
It is with respect to these and other problems that the present invention is provided.
Description of the invention


On the one hand, the embodiments of the invention refer to the method for mapping one or more key-value pairs, associated with a document, in one or more tabular structures, the key-value pairs have a key name and a value, and each of said one or more tabular structures have one or more rows and columns to store values. For example, the tabular structure may be persistent storage. According to one or more embodiments, the method comprises reading the document, with a document reader, to identify the key-value pairs associated with the input document. For example, the document can enter from a data source, such as a relational database, and can be of many formats, such as JSON or XML. Then, the method determines whether a value associated with a data for key-value pairs is a scalar value or a composite value.
In the case where the value associated with the key-value pair is a scalar value, the method performs the steps of extracting the key name of the key-value pair and stores the value of the key-value pair in a row of the tabular structure In one embodiment, the method also performs the step of checking when the tabular structure has a column associated with the name of the key extracted from the key-value pair. This may include generating a new column associated with the tabular structure that stores the value of the key-value pair in a row of the tabular structure. In some embodiments, the column is identified by a type associated with the value of the key-value pair.
In the event that the value associated with the key-value pair is a composite value, the method performs the steps of extracting the key name of the key-value pair and generating a tabular substructure associated with the name of the key of the pair key-value In some embodiments, in the event that a value associated with the key-value pair is a composite value, the method also performs the step of sending the value of the key-value pair to a temporary data structure. In other embodiments, the method performs the step of recursively iterating the precedents until each of the key-value pairs has been extracted to a tabular structure or tabular substructure.
In addition, the method in one or more embodiments additionally includes the processing of the key-value pairs contained in composite values that are sent to temporary data structures. For example, the temporary data structure can be a stack of operations or a linked list. In particular, the method determines when the composite value associated with the key-value pair sent to the temporary data structure includes one or more key subpairs


value. In addition, the method determines when the key-value subpairs are a scalar value or a composite value for each of the key-value subpairs. In the event that the value associated with the key-value subpar is determined as a scalar value, the method performs the steps of extracting the key name of the key-value subpar and checking whether the tabular substructure has a column associated with the name of the key-value subpar.
Continuing with the above, if the verification is correct, the method continues to perform the step of storing the value of the key-value subpar in a row of the tabular substructure and maps the value of the key-value subpar to the column. If not, the method performs the step of generating a new column associated with the subtabular structure, storing the value of the key-value subpar in a row of the tabular substructure and mapping the value of the key-value subpar to the new column. In the event that the value associated with the key-value subpar is a composite value, the method performs the steps of extracting the key-value subpar's key name, and sends the value of the key-value subpar to the structure of temporary data. In the event that the value associated with the key-value subpar is not a composite value, the method performs the step of eliminating the key-value pair sent from the temporary data structure. The above steps can be iterated until the temporary data structure does not contain any other key-value pairs sent previously.
Seen in another way, the embodiments of the invention relate to systems for extracting and storing document data in a database. In one or more embodiments, the system includes a data analysis apparatus that includes a processor and a memory coupled to the processor. In addition, the system includes a data source that contains one or more documents. Each document contains one or more key-value pairs, and each key-value pair has a key name and a value, in which each value has a type of value. In addition, the system includes a document reader to receive documents from the data source through a network, the document reader will be coupled to the data analysis apparatus. Additionally, the system includes a structure extraction module.
The structure extraction module implements a processor program code to generate one or more tabular structures, which have at least one corresponding column with the name of the key and the type of value of each of the key-value pairs. Likewise, the system includes a data extraction module. The structure extraction module implements a


processor program code to extract the value in a row of data in the tabular structures created by the same data extraction module. Finally, the system includes one or more metadata tables generated by the data extraction module with references to one or more tabular structures.
Brief description of the designs
The invention is illustrated in the figures of the attached designs, which are intended to be examples and not limitations, in which references are intended to be references to corresponding or similar parts, and in which:
Fig. 1 presents a block diagram illustrating a system for mapping document data to a tabular structure according to an embodiment of the presented invention;
Fig. 2 presents an illustration showing a hierarchy of example key-value pairs according to an embodiment of the present invention;
Fig. 3 presents an example flow of the method for mapping document data to a tabular structure according to an embodiment of the present invention;
Fig. 4 shows a flow of the mapping method of one or more key-value pairs associated with a document in one or more tabular structures according to an embodiment of the present invention;
Fig. 5 shows a flow of the method for generating the columns of the tabular structures according to an embodiment of the present invention;
Fig. 6 presents an illustration showing the resolution of the type conflicts of the values according to an embodiment of the present invention;
Fig. 7 presents an illustration showing the generation of the metadata tables according to an embodiment of the present invention;


Fig. 8 presents an example document structure containing multiple key-value pairs to be processed in a tabular structure according to an embodiment of the present invention;
5 Fig. 9 presents an illustration showing the data of the document of Fig. 8 mapped to a tabular structure generated in accordance with an embodiment of the present invention;
Fig. 10 presents an illustration showing the root processing of an example document of Fig. 8 according to an embodiment of the present invention;
10 Fig. 11 presents an illustration showing the processing of the first sublevel of the example document of Fig. 8 according to an embodiment of the present invention;
Fig. 12 presents the illustration showing the processing of the second sub-level of the example document 15 of Fig. 8 according to an embodiment of the present invention; Y
Fig. 13 presents the illustration showing an example operation on the stack according to an embodiment of the present invention;
20 Description of a preferred embodiment
Throughout the report and the claims, the terms may have a specific nuance, suggested or implied in the context beyond an explicitly written meaning. Also, the phrase "in an implementation" or "in one embodiment" used in this document 25 does not necessarily refer to a different implementation or embodiment. Similarly, the phrase "one or more implementations" or "one or more embodiments" used in this document does not necessarily refer to the same implementation or embodiment and the phrase "at least one implementation" or "at least one embodiment" used in this document they do not refer to a different implementation or realization. The intention is, for example, that the claims that
30 are subject matter of this document include example combinations of implementations and realizations in whole or in part.


This document provides systems and methods for mapping data associated with a document, for example one or more key-value pairs, in one or more tabular structures or other relational data formats. In this document, a "document" is a set, not necessarily ordered, of key-value pairs, where the key is a data identifier (for example, a name) and the value is either a scalar value or a composite value. Composite values can be heterogeneous or homogeneous nested documents or collections of other scalar and / or compound values. For example, a composite value can be an ordered grouping or a subdocument contained within a document. Documents include data stored at one or more levels of the document. For example, each document has a root or base level. After that, the document includes at least one sublevel for each key-value pair stored at the root level and having a compound value. The sub-levels can include subsequent composite key-value pairs and that is why the document can include as many levels as there are nested subdocuments. In addition, documents can be grouped into one or more collections, which, in turn, can be grouped into databases. A collection can include grouped documents that have a relationship between them (for example, the same type of document) or grouped documents that have no relationship between them.
In particular, the systems and methods described in this document improve conventional practices for storing data from unstructured documents by transforming data from unstructured documents into tabular structured data or other relational data systems. A tabular structure is a possible persistent storage that groups data into tables. A tabular structure is composed of rows and columns. Each record or tuple is stored in a row, and the column defines each attribute of the row with a name and type. For example, the value of a key-value pair is stored in a row of a table and is associated with a particular column that is defined according to the name of the key and the type of the value (for example, integer, double , string, character, boolean, etc. -where integer is an integer numeric type, float is a numeric type represented with a floating point and double the same as the previous but double precision, string is a text string, character is a type of data that represents a single character and boolean in a type of binary logic data that can have true or false value) of the key-value pair.


In one aspect of the present invention, the systems and methods shown and described herein include two parallel data processing. First, the present application details the methods to extract data structures from a document and then to dynamically generate and / or update a tabular structure (for example, tables) to represent the content of the document data based on the structure extracted from the documents. data. In one or more embodiments, the methods provided herein define the metadata information in the corresponding metadata tables, such metadata information references the structures extracted from the data, so that the tabular structure is dynamically updated, in accordance with the new input document data in accordance with the correct reference to the data already stored. Second, the present application details the methods for reading the document data, to create and store the data records in the tabular structures generated by the first data processing process. This includes documents that have composite values, such as nested documents (for example, a key-value pair that has a value that is also a document). The present method shown and described here generates additional tables for each key-value pair that has nested documents.
The information is represented as a set of tabular data or relational tables in a relational system according to the design of the system scheme. Classifying the data in this way allows queries without making a complete reading of the database, since only a part of the tables or the subset of the data in the tables are the objective of the query criteria, this allows improving the query capabilities, reduce I / O and CPU bottlenecks and improve cache utilization, compared to conventional practices for handling hierarchical or nested unstructured document data. For example, a tabular or relational system, such as the one shown and described here, defines the information of the table and that of the column only once, which allows a considerable reduction in storage and cache compared to the data stored by Conventional NoSQL databases.
Fig. 1 illustrates a system 100 for mapping data from a data source to a tabular structure according to one or more embodiments of the present invention. The system 100 includes a data processing apparatus 105 having a processor 110, a memory 115 coupled with the processor, and one or more software modules 120 that implement, through the processor, a program code stored in the memory for carry out aspects of the mapping


of data like those shown and described here. For example, software modules 120 may include a structure extractor 125 to generate tabular structures to store the data and a data extractor 130 to extract the data contained in a document and store that extracted data in a generated tabular structure. The data processing apparatus 105 may include, for example, servers, laptops and / or desktop computers, and computing devices such as tablet computing devices, smart cell phones, personal digital assistants or the like. Memory 115 may be used to store data, metadata, and programs to be executed by processor 110. Memory 115 may include one or more volatile and non-volatile memories, such as random access memory (Random Access Memory or “RAM”), read-only memory (Read Only Memory or “ROM”), flash memory, phase change memory (Phase Change Memory or “PCM”), or other types.
The apparatus that processes the data 105 is configured to access the data source 140. The data source 140 may be local to the data processing apparatus 105, or remote, in which case the two are connected through a network 135 (for example, a wired or wireless network, 3G / 4G networks, etc.). In one or more implementations, data source 140 is a database. For example, data source 140 may be a relational database. The data stored in the data source 140 is stored in one or more collections 145, each collection has one or more documents 150. The documents 150 include unstructured data stored along a route in the document in the form of a set of key-value pairs, in which each key-value pair includes a key (or "key name") that is an identifier that represents a value of the data associated with that key. The value of the data (or "value") can be a scalar value (for example, an integer, double, string, character, float, boolean, empty value, etc.) or a composite value (for example, an ordered grouping, a record, a set, a function, a nested value, etc.). In some cases, a key-value pair can be a string, for example, the name of a book, and a value corresponding to that string, for example, "The Martian." A document is structured so that it first contains the key-value pairs at a root level. If a document includes nested data at a root level (or any sublevel),
that is, a key-value pair that has a compound value (for example, another key-value pair), then the data is stored in a sublevel. In one or more embodiments, a document 150 is stored in a particular data exchange format. For example, document 150


It can be stored in JAVASCRIPT Object Notation (“JSON”), Extensible Markup Language (“XML”), Resource Description Framework (“RDF”), YAML, Rebol, or Gellish.
Fig. 2 illustrates the structure of a hierarchical key-value pair of an example document 200 according to one or more embodiments of this document. In this example document 200, a root level 205 includes four or more key-value pairs, denoted by K1, K2, K3 and K4. The key-value pair K1 has a value V1, which is a composite value. In the illustrated example, V1 contains two nested key-value pairs, K5 and K6, which are stored in a first sub-level 210. The key-value pair K2 has a value of V2, which is a scalar value, and consequently does not have sublevel associated with it. The key-value pair K3 has an associated empty value, which also means that it has no sublevel associated with it. The key-value pair K4 has a value of [V7, V8], which is an ordered grouping (a composite value). The ordered groupings are stored in a sub-level, and Fig. 2 illustrates that the ordered group K4 stores the scalar value V7 in a second sub-level 215 and the composite value V8 in a third sub-level 220. V8 contains two key-value scalar pairs, K9 and K10, and without further identified compound values, the document is stored in its entirety.
Referring now to Fig. 3, a method is provided for mapping document data in the form of one or more key-value pairs associated with the document in one or more tables of a tabular structure according to one or more embodiments. . Method 300 described assumes that the system is in an initial empty state, that is, without any existing data table or metadata. However, the method is not limited to a system being initially empty, and can be implemented with systems in any previous state. In this way, the method can be used in persistent and durable systems in which information and metadata persist after a restart or system failure. In addition, the metadata of which reference is made can be used later to reconstruct the data of the original document and / or reuse it to access the data.
Method 300 begins in step 302, in which one or more documents are provided for processing. In one or more embodiments, the documents are obtained from the data source, such as a database (for example, a data source 140). For example, the documents can be one or more JSON or XML files. In other embodiments, the documents are provided locally, as in memory 115. For example, a user may


Entering information into a local document storage in the data processing device 105 through an input device, such as a keyboard. In step 304, a particular document is selected as input. For example, a document can be selected through filtering, as is done by those with ordinary abilities in the art, or through another selection process (for example, the first document to enter is the first to exit or “ first in / first out ”, the first document to enter is the last to exit or“ first in / last out ”, alphabetical order, etc.). The selected document is sent to a document reader, step 306. The document reader reads the documents to identify the data stored in the document, process them and store them in tables. For example, the document reader identifies the key-value pairs stored within the document at the root level. The document reader can be any data reading device, such as a data processing apparatus 105.
In step 308, the data of the identified document is analyzed to determine if the data includes composite values. In one or more implementations, the composite value data is stored in a temporary data structure. For example, if the document reader identifies a key-value pair that has a value that is a document or a nested ordered grouping, the value is sent to a temporary data structure, such as a stack, until all key pairs- value that have scalar values at a certain level of the document are processed. The temporary data structure is not limited to the stack structure, this may include other elementary data structures such as a linked list, a doubly linked list, an ordered grouping, a queue, etc. A stack structure is not preferred, but it is a valid embodiment as described herein. The analysis of document data, however (and the process data, as in the following steps), does not require a temporary data structure. In other embodiments, programmatic techniques are implemented to determine compound value data. Example programming techniques that fall within the scope of the embodiments of the present invention include, but are not limited to, recursion.
Continuing with Fig. 3, method 300 processes non-composite values of the document data at the current level of the document to generate and populate the tabular structures. Non-composite values (for example, scalar value data) are processed first as a document and may have key-value pairs stored at multiple nested levels after the root level (for example, the base or non-nested level). In this way, process all


The scalar values of the data before moving on to the next sub-level ensures that no key-value pair is omitted in the processing. In one or more embodiments, the method implements existing tabular structures, generates tabular structures, or implements a combination of existing and generated tabular structures depending on whether the target storage system (for example, data source 140) already includes a reference. to a tabular structure for those types of data. For each composite value (or "subdocument") the method generates a reference to an additional tabular structure (or "tabular substructure" or "subtable") that is used to store the data of the sublevel.
In step 310, a structure extractor generates a reference to a table. For example, the structure extractor is the structure extractor 125. The reference to a table provides write and read access to the table and can be a reference to an existing or generated table in view of the name of a key and the type of value of the key-value pair (for example, metadata). In one or more embodiments, the columns in the table are associated with the metadata. The structure extractor also generates a row of data to store the data in the table, in which a row is mapped to specific columns in the table. Then, method 300 continues and the data extractor extracts the value of a key-value pair, step 312. Then, method 300 stores the value of the extracted data in the table, step 314, and the value is referenced with a set corresponding metadata, step 316. For example, each unique key-value pair (for example, a scalar value) at a document level, is processed and stored in a particular row and column according to the reference in the metadata in the document. metadata table (for example, the type and name of a key-value pair). Method 300 then iterates for each scalar value at the current level of the document. If the current level of the document identifies data of a composite value, method 300 processes the composite value by sending it to a temporary data structure (for example, a stack), as in step 308. After all scalar values of the key-value pairs have been processed, the method jumps to a sublevel of the document path relative to an identifier of a value composed of a key-value pair in the temporary data structure and executes the method again. This may include generating and populating subtables that are associated with the root level table.
Referring now to Fig. 4, the flow of method 400 illustrates the mapping of one or more key-value pairs associated with a document in one or more tabular structures according to a particular embodiment. In this sample method flow, a document that has one or more


key-value pairs are analyzed first at their root level, and the values of scalar key-value pairs at the root level are processed and stored in a tabular structure. The values of the composite key-value pairs are sent to a temporary data structure until the root level is completed, and method 400 jumps to the sub-levels to process the composite key-value pairs in subsequent subtabular structures that have a reference to the highest level tabular structures with dependence to the root level.
More precisely, the flow of method 400 begins in step 402 in which a data processing device accesses a data source that has one or more documents. Each document typically has one or more key-value pairs, although method 400 can process documents that have no stored data. In that case, method 400 either produces no tabular structure, or produces a single empty tabular structure, depending on the implementation. In addition, each document has one or more document levels depending on whether there are nested subdocuments (for example, composite key-value pairs) that are contained within the document data. Method 400 is capable of processing nested values in as many subtabular structures as there are nested levels. In step 404, a particular document is read from the data source. In one or more implementations, the document is read by a document reader (for example, a data processing device 105). Next, a key-value pair is identified at the root level of the document, step 406. The order of identification of the key-value pairs is indifferent, which means that method 400 can first process any key-value pair in the current one. level of the path in the document, regardless of the position where the key-value pair is stored (for example, you do not have to process the first pair present in the document).
Continuing with the reference to Fig. 4, in step 408, method 400 generates a tabular structure of the root level as a value of the storage position. For example, the generation of the tabular structure preferably includes the generation of the table having one or more columns and at least one row for storing the document data. In addition, in the generation of the tabular structure, the name of the tabular structure at the root level is defined as the identifier of the data source (for example, the name of the database) or the name of the collection where the document is stored, and at least one column of the tabular structure is defined according to the name of the key and the type of the value of the key-value pair identifier. Then, the value associated with the key-value pair is determined, step 410. For example, a


Data processing apparatus may implement a program code (for example, a structure extractor 125) to determine when the type of value is a scalar value (for example, an integer, a double, a boolean, a string, etc.). ) or a composite value (for example, an ordered grouping, a nested subdocument). If the value of the key-value pair is determined as a scalar value, then the method jumps to step 412 and the key name is extracted. For example, the name of the key is extracted and a column is generated in the tabular structure according to the method of generating column 500, described below. Then, the value of the key-value pair is stored in a row of the tabular structure and mapped with the generated column, step 414.
However, if in step 410, the type of the value associated with the key-value pair is determined as a value composed of the data processing apparatus, then the method jumps to step 416 and the name of the key is extracted . For example, the name of the key is extracted and a column is generated in the tabular structure according to the method of generating column 500, described below. Then, a tabular substructure associated with the name of the extracted key is generated, step 418. For example, the tabular substructure is generated in the same way that a root tabular structure is generated (for example, as a table with columns and rows) , with the exception that the tabular substructure has a reference to the root tabular structure (and any other tabular structure that intervenes) through a reference column to a table (or "table_ref"). The name of the tabular substructure is defined, in a
or more embodiments, such as the identifier of the data source or the name of the collection, followed by a hyphen, followed by the name of the extracted key. In one or more embodiments, a reference column is generated both in the tabular structure and in the tabular substructure that links the two. For example, the boolean type column that links the tabular substructure as a child of the parent tabular structure is generated, as shown in Figs. 8-12 as described in the document. Since composite values contain data stored in subdocuments, to ensure that all data in the document is processed and mapped to a table, the values identified as composite values are stored in a temporary data structure, step 420. For example, the structure Temporary data structure can be an elementary data structure, such as a stack, a linked list, a double linked list, an ordered grouping, a queue, etc.


When the value of the key-value pair is determined as a scalar value or a composite value, method 400 jumps to step 422, in which the method determines when there are additional key-value pairs to be computed at the current document level. According to method 400, before advancing to the next subdocument level, each of the scalar values of a specific document level must be processed and stored in a tabular structure, and each of the compound values at the same level of the document It must have a reference to the tabular substructure and the nested values must be sent to a temporary data structure. If there are additional key-value pairs at the current level of the document that have not yet been processed, then method 400 skips to step 410. If there are no more key-value pairs at the current level of the document, the method skips to step 424 and determines if there is data in the temporary data structure. For example, if a composite value has been identified in steps 410-420, then there will be data in the temporary data structure.
If method 400 determines that there is data in the temporary data structure, the method jumps to step 426 and changes the value of the storage position to the generated tabular substructure. In other words, the method drops from one level of the document (for example, from the root level to the nested level 1, or from the nested level 1 to the nested level 2, etc.) to be able to store the nested data. From this point on, the method enters a loop in step 410 and processes the next level of data. Thus, method 400 provides a recursive method to traverse a document that has a source of key-value pairs. If, in step 424, method 400 determines that there is no more data in the temporary data structure, the method skips to step 428 and ends. For example, there is no more data in the temporary data structures when a document does not have composite key-value pairs, or if all the key-value compounds in the document have been processed.
Referring now to Fig. 5, a flow of a column generation program 500 illustrates the generation of columns in a table for storing document data according to a
or more embodiments of this document. Although the present invention allows the extraction of data from documents in a system with a pre-existing tabular state, the current embodiment starts from an empty initial state. In cases where there are currently no tables for the document data, the columns for the data tables are created and a reference to the document data in the metadata is created while the structure is extracted (for example, the structure extractor 125, step 310 of method 300). The 500 column generation method


It begins at step 502 in which the key name and the data type of the key-value pair are extracted. The extraction of the data type includes the extraction of the key name, the value and the type of the value (for example, single, double, string, boolean, etc.) from a key-value pair. For example, in step 502, if the key-value pair that is mapped has a code name "pizza" and the value in a string containing "pepperoni", "pizza" is extracted to generate the column, "pepperoni" it is extracted to be stored in a row of the table, and "_s" is extracted for the generated column. Other types of data have different extensions that depend on the type of value, such as "_i" for integer, "_d" for double, "_b" for booleans, etc.
Then, in step 504, the name of the column is calculated. For example, continuing with the pizza example, a column named "pizza_s" is generated. Method 500 then determines when a generated column that has the same calculated name already exists, step
506. If the column does not exist, method 500 creates a column with the generated name, step
508. If the column exists, method 500 terminates, in step 510. From this point, the extracted data is stored in the column. For example, "pepperoni" would be stored in a row of data and mapped to "pizza_s" in the previous example. In addition, in one or more embodiments, if the key-value pair that has been extracted is at the root level of the document, the name of the table will be the name of the collection or any other equivalent naming mechanism is provided to the set of related documents. For example, if the collection that stores the key value example pair "pizza" is called "foods", then the name of the table will be "foods".
If a document has a key-value pair that has an identical key name, but with a conflicting type of value, then the method of generating the tabular structure undertakes specific steps in one or more embodiments. Such a conflict is a common problem found in conventional documentary database systems. Two examples of such a case are illustrated in Fig. 6. In the first example, the document in the entry includes a first key-value pair 602 that has a key name "key1" and an integer type value of 33, and a second key-value pair 604 also with a key name "key1", but is a string type that has a "foo" value. Then, the method of generating column 500 generates a first column 606 "key1_i" corresponding to the first key-value pair 602, and a second column 608 "key1_s" corresponding to the second key-value pair 604. In this case, the type conflict does not require additional steps beyond those in method 500, since the method generates two


columns with different names, although each key-value pair has the same key name. In the second example, a first set of key-value pairs 610 includes a first key-value pair 612 with a code name "name" and a value "Joe" of string type, and a second key-value pair 614 having a code name "age" and an integer value 35. A second set of key-value pairs 616 includes a third key-value pair 618 with a code name "name" and a value "Eve" of type string and a fourth key-value pair 620 with a code name "age" and a value "25" of type string. When implementing the method of generating column 500, since both the first key-value pair 612 and the third key-value pair 618 have the same key name and value type, there are no conflicts and a first column 622 is generated with a reference in the metadata of "name_s" and "Joe" and "Eve" are stored in rows of data in their order of processing. In contrast, the second key-value pair 614 and the fourth key-value pair 618 have a type conflict because they have the same code name "age", but the second key-value pair is an integer, while the fourth pair key-value is a string. To remedy this, two separate columns are generated, a second column 624 that has reference "age_i" and a third column 626 that has reference "age_s". Their respective values are mapped to those columns, and the rows that have a key-value pair with reference to that column are left empty, as shown in Fig. 6.
To perform the mapping in both directions between the key-value pairs and the tabular system, described above, the systems and methods of the present application implement metadata references. In one or more embodiments, metadata storage is performed in one or more metadata tables that have reference to column definitions of the data tables that represent the document data. Metadata serves two fundamental objectives. First, since the metadata has a reference to the columns of the tables that the document represents in its original form, they provide a link to the structure of the original document, thus allowing its reconstruction. Second, references in metadata do not depend on implementation, which means that they can be a reference to different tabular structures or relational databases, if they are fully persisted, partially persisted, or not persisted.
Referring now to Fig. 7, an example of a set of metadata tables according to one or more embodiments is illustrated. This set of metadata tables, for example, is merely one way in which metadata can refer to data tables


of documents, and you don't want to limit this application to that particular set of tables. A data source metadata table 702 is provided to store references to data sources that contain a given document. This includes two columns, one defined as "name", which is assigned by the user to the data source, and one defined as "identifier", which is an internal identifier generated by the data mapping method of this document. Typically, these two are the same, although the “identifier” column may be different in implementations that have a data source that is a persistent database, because the names of the databases can be accessed and modified in the persistent databases. In one or more embodiments, the "identifier" column is generated to solve the limitations on access to data ("backend") inherent in the data source. For example, the "identifier" column is generated if the backend does not allow a certain number as an identifier or has size limitations. A collection metadata table 704 is provided when the data source is a database that has one or more collections to allow storing references to a particular collection that contains a given document in the database. The collection metadata table 704 includes the same columns in the metadata table of the data sources 702, and also adds a third column "database" that references the metadata table of the data sources.
At the same time that a document is processed, for example by method 300 or method 400, its structure level and its subdocuments (for example, composite values, nested documents) are saved and mapped to a table structure in particular. A table of the metadata of the indexes of the documents 706, is provided to store the references to the data stored in the paths of the documents and of the subdocuments contained. A document path is an ordered list, starting from the root level, of the keys that must be traversed to reach a certain value. For example, the table of metadata of indexes of documents 706 includes a "database" column as in the previous case, and a "collection" column that references the metadata table of collections 704, a "table_ref" column which references the path to a document, an "identifier" column that references the internal identifier to the document path and matches at least one generated data table, and a "last_rid" column that references the last row in the data table generated identified.


Each time a key-value pair in a document is processed, a reference between the key of the original document and the identifier stored by the system is stored. For example, this reference is stored in a key metadata table 708. The key metadata table 708 can include a "name" column that is a reference to the key name of the key-value pair in the document (and not the name of the data source, since it is referenced in the "database" column), a "type" column that references the type of the key-value pair data (either scalar or composite), and an "identifier" column which references the internal identifier of the key and matches a row in the table of the metadata of the indexes of documents 706 by the columns "database", "collection" and "table_ref".
Because documents can include scalar values and composite values, for a key-value pair that has an ordered grouping, an ordered grouping metadata table 710 can be included to take into account the key-value pairs that contain an ordered grouping of scalar values. The sorted metadata table 710 can include a reference to columns by the columns "database", "collection" and "table_ref" as in the previous case. In addition, the ordered grouping metadata table 710 may include a "type" column indicating the type of data stored in the ordered grouping, including a reference for each row in the table from the ordered grouping, and an "identifier" column that reference the internal name of the column that stores the value in the table related to the path in the document.
In addition, Fig. 7 additionally illustrates four automatically generated columns 712 for the tabular structures of the data according to one or more embodiments. These automatically generated 712 columns are considered metadata and store the relationship between all the values in the same document. For example, as illustrated in Fig. 7, the “did” column is a document identifier that uniquely identifies the document and all rows in each of the tables belonging to the same document share the same “did” value. ; the “rid” column identifies the rows of the same table to distinguish the cases in which a table contains more than one row of the same document identifier (for example, an orderly grouping); the "pid" column is an identifier of the parent that references the parent row (for example, the row on which the nested key-value pairs depend); a “seq” column that references the order of the elements of an ordered grouping (for example, to reconstruct the document in its original format if necessary). Anyway, depending on the level of the


Table and document content, not all columns have to be necessary. For example, at the root level of a document, only the “did” column is necessary, since the order is not important, there can only be a single row, and no parent row on which it depends. Similarly, a second-level subdocument (for example, a first nested document) only needs the “did” and “rid” columns as long as the subdocument is not an ordered grouping. In one or more implementations, the present application performs automatic optimizations to automatically generate and only the columns of the necessary data. This saves computer resources, such as memory and storage space, which is particularly advantageous for persistent databases that can be modified on an ongoing basis.
With reference to Fig. 8-12, an example document 800 having multiple key-value pairs is processed, and the data contained in the key-value pairs is mapped to generate a tabular structure. This example is provided by way of illustration to more fully describe the systems and methods of one or more embodiments described in this document, although it is not intended to limit the application only to this example.
Referring to Fig. 8, document 800 is received from the data source for processing. Document 800 contains data related to the identification of the book “The Martian” by Andy Weir. For example, document 800 may be a record stored in a library database. In this example, document 800 includes a name 805 "book". The name 805 can be assigned by the user, or automatically assigned as a reference to a collection where the document is stored, or to the data source itself. There are three key-value pairs stored at the root level of sample document 800: a first key-value pair 810 that indicates the value that identifies the book, a second key-value pair 815 that indicates the name of the book, and a third 820 key-value pair indicated by the author of the book. The first key-value pair 810 has an "id" key and a 5370 scalar value stored as a double type. The second key pair value 815 has a "name" key and a scalar value of "The Martian" stored as a string type. The third key-value pair 820 has an "author" key and a value composed of a nested subdocument containing a fourth key-value pair 825 (indicating the name of the author of the book) and a fifth key-value pair 830 (which indicates the other book identifiers that have the same author). Since the third key-value pair is a composite value, document 800 stores the fourth key-value pair 825 and the fifth key-value pair 830 in a sublevel. He


fourth key-value pair 825 has a scalar value of "Andy Weir" stored as a string type. The fifth key-value pair 830 has a value composed of an ordered grouping, the same ordered grouping contains three scalar values each stored in a subsequent sub-level.
Turning now to Fig. 9, the tabular structure generated and mapped by document 800 according to the methods for extracting the data from the document and mapping the data to the tabular structure as described in one or more embodiments illustrated herein (for example , by method 300, method 400). In this sample document 800, the five key-value pairs are processed and mapped to three tables according to the nested data of the document: a root level table 910, a first level table 920 that depends on the root level table , and a second sub-level table 930 that depends on the table of the first sub-level. As shown in Fig. 9, the first key-value pair 810 and the second key-value pair 815 stored at the root level of document 800 are mapped with the root level table 910, the third key-value pair 820 is mapped with the table of the first sub-level 920 due to the nested subdocument it contains, and the fourth key-value pair 825 and the fifth key-value pair 830 are mapped with the table of the second sub-level 930. The information used to populate the names of the Columns and row data of the tabular structure used to provide the structure and definition of the table is provided by the metadata reference, as described in Fig. 5
7.
Referring now to Fig. 10, the generation of the root level table 910 and the mapping of the first key-value pair 810 and the second key-value pair 815 is illustrated. To avoid possible limitations resulting from the storage solution used, an internal name is assigned for each identifier that appears, and used through the data mapping process. In one or more embodiments, the level structure of the document 800 is processed to generate one or more metadata tables to provide an index of the path in the document for the key-value pairs stored there. For example, the reference between the original name and the internal name is stored in the metadata tables. In the example, a doc_part 1010 metadata table provides reference identifiers to define the structure of the root level table 910. The root level table 910 in this example is called "book" and is based on an identifier of the level root, which is selected in view of the collection or data source in which document 800 is stored. The “table_ref” column of the table


of doc_part 1010 metadata in an empty set since the root level table 910 does not depend on any higher order table (it is at the root level).
The metadata table of fields 1020 defines the structure of an additional column in the table of root level 910 according to the method of generating columns 500. As illustrated in Fig. 10, the table of root level 910 refers to the table of metadata from the 1020 fields to generate the “id_i” columns (which reference the first key-value pair 810 and the integer type value), “name_s” (which reference the second key-value pair 815 and the string value type ), and "author_b" (which references the third key-value pair 820 and the type of boolean value). In other embodiments in which the table is mapped to a pre-existing one (for example, a root level table called "book" that already exists), these steps for generating column metadata are not necessary if the type of the pair key-value matches. From this point, a row of data is generated in the table of the root level 910 and is populated with the data values of the key-value pairs of the root level (for example, 5370, "The Martian", and false (false ) - which means that this value is not scalar). For composite values, such as the third key-value pair 820, the values are not stored in the root level table 910. Instead, these values are sent to a temporary data structure for further processing. The use of a temporary data structure, for example a stack of operations is shown in Fig. 13 and is fully described below. The process continues until each key-value pair at the root level is extracted and mapped in the root level 910 table.
Referring now to Fig. 11, the generation of the table of the first sublevel 920 and the mapping of the third key-value pair 820 and the fourth key-value pair 825 is illustrated. The methodology for generating the table of the first sub-level 920 is similar to that for the generation of the table of the root level 910, with the exception that additional references to column metadata are generated. In the example, the doc_part 1010 metadata table provides the reference identifiers that define the structure of the table of the first sublevel 920 (for example, a path in the document). For example, the table of the first sub-level 920 of this example is called "book_author" since the "table_ref" column of the doc_part 1010 metadata table now includes a reference to "author", which is the key name of the third key-value pair 820 and the current document level. As discussed above, the metadata table of fields 1020 defines the structure of the additional column of the table of the first sub-level 910 according to the method of generating column 500. Here, the table of the first


Sublevel 920 refers to the table of metadata of fields 1020 to generate the columns “name_s” (which refers to the fourth key-value pair 825 and the string type), and “books_b” (which refers to the fifth key-value pair 830 and the boolean type). From here, a row of data is generated in the table of the first sub-level 920 and is populated with the data values of the key-value pairs at the level of "book_author" (for example, "Andy Weir" and true (true) - which means that this value is not a scalar value, and has only scalar values nested in it). The composite value of the fifth key-value pair is moved to a temporary data structure to be processed later. This temporary data structure may be the same or another of the temporary data structure that receives the composite values sent from the root level.
Referring now to Fig. 12, the generation of the table of the second sub-level 930 and the mapping of the fifth key-value pair 830 is illustrated. As in the “book_author” level, the doc_part 1010 metadata table provides an identifier to define the table of the second sub-level 930. In this case, the fifth key-value pair 830 has the name “books”, which is added to the "table_ref" column of the doc_part 1010 metadata table to generate the name of the table of the second 930 sub-level "book_author_books". The fifth key-value pair 830 is an ordered grouping that represents three scalar values and the methodology provided here generates a 1210 scalar metadata table with a column representing the double data value type of the fifth key-value pair. As in the table of the root level 910 and in the table of the first sub-level 920, a row of data is generated to store the processed values of the fifth key-value pair 830; However, since there are three scalar values, three rows of data are generated and the values mapped to the column "v_d" (for example, "value" of type double). To maintain the integrity of the order of these values, the table of the second sub-level 930 generates a "rid" column (to identify the row) and a "seq" column (to identify the sequence of the values in the ordered grouping). Since there is no other compound value in document 800, the process ends and the three created data tables are stored in the database or in another desired storage system.
Referring now to Fig. 13, an example stack operation 1300 of a processing of the temporary data structure is illustrated. In one or more implementations, a stack operation 1300 is implemented to temporarily store the data of a composite value while the data of a scalar value is processed, and therefore provides an improvement in the efficiency of data mapping and ensures that no data of the document is omitted. Other structures of


Temporary data are suitable for use with embodiments of the present invention, such as linked lists, doubly linked lists, queues, ordered groups, and the like. A 1300 stack operation implements a LIFO (“last in, first out” processing methodology, the last thing that comes in is the first to leave) in which the process sends the data to the stack until the most appropriate moment of processing, in the which data is extracted from the stack and processed. In one or more embodiments, the stack operation 1300 is independent of the order, which means that the processing of the key-value pairs of a document can be carried out in any order according to the level of the route in the document.
Fig. 13 illustrates the path in the document for document 1305 which has eight key-value pairs that are nested composite values. In this example, keys 1, 3, 4, 6 and 8 are types of scalar values, and keys 2, 5, and 7 are types of composite values. Since the operation of the stack 1300 is independent of the order, Fig. 13 illustrates two different processing orders according to a particular embodiment. Regardless of the order and the moment in which they are processed, the result of the mapping is identical.
In the first order 1310 the key-values are processed according to the arrival of the route in the document. In the first order 1310, the stack is initially empty. In the first iteration of stack operation 1300, the root level of document 1305 is read, and each scalar value is processed according to the extraction and mapping methods described herein. In this case, the only scalar value at the root level is key 1. Keys 2 and 7 are composite values and are sent to the stack in the same order of arrival, which means that since key 2 arrives first in the document path, it will be placed above the key 7 to be processed first 1310. Then, the stack operation performs a second iteration to process the key-value pairs stored here. Since stack operation 1300 is a LIFO operation, the last composite value stored in the stack, for example, key 2, is processed first. Key 2 includes two scalar values (key 3 and key 4), which are extracted and mapped to a data table. Key 5 is a nested subdocument and is sent to the head of the stack. However, now key 5 is the last key-value pair, and the second iteration interrupts the processing of key 2 and the 1300 stack operation is iterated a third time to process the key
5. Key 5 contains a single scalar, key 6, which is processed. Since there are no more key-value pairs at this level, the key 5 is removed from the stack and the first order 1310 returns to the point where it had been interrupted in the second iteration. Key 2 also has no more key pairs


value, and that is why key 2 is removed from the stack. The first order 1310 then iterates a fourth time to process the key 7, which has a single scalar key key8. Then, without data stored in the stack, the 1300 stack operation ends.
In the second order 1320, an entire level is processed before continuing through the composite values stored in the stack at that level and earlier. As before, the second order 1320 begins with an empty stack. After the first iteration, stack operation 1300 identifies two composite values, key 2 and key 7, and sends them to the stack in the order of arrival, rather than their order in the document, which means that the key 2 is read first and sent to the stack first, and that key 7 is finally in the head of the stack. After the second iteration, stack operation 1300 processes the data of key 7 and extracts it from the stack. After the third iteration, stack operation 1300 processes the data of key 2, identifies key 5 with a composite value, and sends key 5 to the stack. After the fourth iteration, the stack operation 1300 processes the data of the key 5 and to complete, it extracts the structure of the key 5, then the processing of the data of the key 2 ends.
Figures 1 through 13 are conceptual illustrations that provide an explanation of the present invention. Those skilled in the art should be trained to understand that the various aspects of the implementations of the present invention can be implemented in hardware, firmware, software, or combinations thereof. In such implementations, the multiple components and / or steps would be implemented in hardware, firmware, and / or software to perform the functions of the present invention. Which means that the same piece of hardware, firmware, or software module can perform one or more of the blocks illustrated (for example, components or steps).
In software implementations, computer software (for example, programs or other instructions) and / or data are stored in a medium that the machine can read as part of a computer program product, and are loaded into the computer system or other device or machine through a removable storage device, hard drive, or communication interface. Computer programs (also called computer control logic or program code readable by a computer) are stored in a main and / or secondary memory, and executed by one or more processors (controllers, or the like) to cause them to run the functions of the invention as described in this


document. In this document, the term "computer readable medium", "computer programming medium" and "computer usable medium" are generally used to refer to media such as random access memory (Random Access Memory or "RAM"); read-only memory (Read Only Memory or “ROM”); a removable storage unit (for example, a magnetic or optical disk, a flash memory device, or the like); a hard drive; or similar.
Particularly, the figures and examples above do not want to limit the scope of the present invention to a single implementation, since other implementations are possible if some or all of the elements described or illustrated are exchanged. In addition, where certain elements of the present invention can be partially or fully implemented using the known components, only those parts are described avoiding going into detail in favor of reading clarity. Here, an implementation that shows a singular component should not necessarily be limited to other implementations that include a plurality of the same component, and vice versa, unless explicitly expressed in this document. In addition, applicants do not claim that any term in the specifications or claims will be given special meaning unless expressly stated otherwise. In addition, the present invention encompasses the present and future knowledge equivalent to the known components of which reference is made herein by way of example.
The foregoing description of the implementations will fully reveal the general nature of the invention that others may, applying the knowledge in the relevant art (or the relevant arts) (including the contents of the documents cited and incorporated with references in this document), modify and / or adapt for various applications such specific implementations, without exaggerating the experimentation, without separating from the general concept of the present invention. Such adaptations and modifications are therefore provided within the meaning and in the equivalence range of the disclosed implementations, based on the teachings and directions presented in this document. It should be understood that the wording or terminology of this document is aimed at the description and not the limitation, so that the wording or terminology of this report should be interpreted by the person skilled in the art in the light of the teachings and guidelines presented in this document, together with the knowledge of an expert in the relevant technique.


While several implementations of the present invention have been described above, it should be understood that they have been presented by way of example, not limitation. It will be apparent to a person skilled in the art that various changes in shapes and details can be made without separating from the spirit and scope of the invention. Therefore, the present invention should not be limited by any of the example implementations described above, instead it should be defined only in accordance with the following claims and their equivalents.

权利要求:
Claims (15)
[1]
1. Method of mapping one or more key-value pairs associated with a document in one or more tabular structures, each of the key-value pair (s) has a key name and a value, and each of the or The tabular structures have one or more rows and columns to store the values, the method comprising:
read the document, by a document reader, to identify the key-value pair (s) associated with the input document; determine when a value associated with a data of the one or more key-value pairs associated with the input document; In the event that the value associated with the key-value pair is a scalar value:
extract the name of the key-value pair key, store the value of the key-value pair in a row of the tabular structure, or if the key-value pair is a composite value:
extract the key name of the key-value pair; generate a tabular substructure associated with the name of the key extracted from the key-value pair.
[2]
2. Method according to claim 1, further comprising:
In the event that the value associated with a key-value pair is a compound value:send the value of the key-value pair to a temporary data structure.
[3]
3. A method according to claim 2, further comprising: determining when the composite value associated with a key-value pair that is sent to the temporary data structure includes one or more key-value subpairs; determine when the one or more key-value subpairs are scalar values or a composite value for each of the one or more key-value subpairs; In the event that the value associated with a key-value subpar is a scalar value:
extract the key name of the key-value subpar, check when a tabular substructure has a column associated with the name of the key extracted from the key-value subpar, and if so, store the value of key-value subpar in a row of the tabular substructure and map the value of the key-value subpar to the column, or if not, generate a new column associated with the tabular substructure, storing the value of the key-value subpar in a row of the tabular substructure and map the value with a key-value subpar in the new column; in the event that the value associated with the key-value subpar is a composite value:

extract the key name from the key-value subpar; generate a new tabular substructure associated with the name of the key extracted from the key-value subpar; and send the value of the key-value subpar to a temporary data structure:
In the event that no value associated with the key-value subpar is a composite value 5, remove the key-value pair from the temporary data structure.
[4]
4. Method according to claim 3, further comprising iterating the steps of claim 3 until the temporary data structure does not contain any key-value pairs sent.
[5]
5. Method according to claim 1, further comprising in the case that the value associated with the value key pair is a scalar value, checking that the tabular structure has a column associated with the name of the key extracted from the pair key-value
Method according to claim 5, further comprising generating a new column associated with the tabular structure and storing the value of the key-value pair in a row of the tabular structure.
[7]
7. Method according to claim 5, wherein the column is identified by a type associated with the value of the key-value pair.
[8]
8. Method according to claim 1, wherein the document is in JSON format.
[9]
9. Method according to claim 1, wherein the document is in XML format. 25
[10]
10. Method according to claim 1, wherein the tabular structure is a persistent storage.
[11]
11. Method according to claim 1, wherein the document enters from a source of data.
[12]
12. Method according to claim 1, wherein the data source is a relational database.

[13]
13. Method according to claim 1, wherein the temporary data structure is a stack.
[14]
14. Method according to claim 1, wherein the temporary data structure is a linked list.
[15]
15. Method according to claim 1, further comprising: in the event that the value associated with the key-value pair is a composite value: recursively iterating the steps of claim 1 until each of the one or more
10 key-value pairs are extracted to a tabular structure or tabular substructure.
[16]
16. A system for extracting and storing data from a document in a database, where the system comprises: a data processing apparatus that includes a processor and a coupled memory
15 with the processor; a data source that contains one or more documents, where each document has one or more key-value pairs, each key-value pair has a key name and a value, each value has a type of value; a document reader to receive the document from the data source through
20 a network, the document reader is coupled to communicate to the data processing apparatus; a module for the extraction of structures that implements a program code by the processor to generate one or more tabular structures that have at least one column corresponding to the name of the key and the type of the value of each of the or
25 key-value pairs; a module for data extraction that implements a program code for the processor to extract the value of a row of data in one or more tabular structures created by the structure extraction module; and one or more metadata tables generated by the structure extraction module with
30 reference to one or more tabular structures.

DRAWINGS











类似技术:
公开号 | 公开日 | 专利标题
Zou et al.2011|gStore: answering SPARQL queries via subgraph matching
US10133800B2|2018-11-20|Processing datasets with a DBMS engine
US8458191B2|2013-06-04|Method and system to store RDF data in a relational store
US20160205101A1|2016-07-14|Distributed Storage and Distributed Processing Query Statement Reconstruction in Accordance with a Policy
CN107038207B|2021-03-19|Data query method, data processing method and device
US20150142733A1|2015-05-21|System and method for efficient management of big data in a database using streaming tables
US10216823B2|2019-02-26|Systems, methods, and apparatus for hierarchical database
US9495398B2|2016-11-15|Index for hybrid database
US9471617B2|2016-10-18|Schema evolution via transition information
US20160321277A1|2016-11-03|Data constraints for polyglot data tiers
Abelló2015|Big data design
ES2668723B1|2019-04-01|MACHINE OF EXTRACTION, MAPPING AND STORAGE OF HIERARCHICAL DATA
Roumelis et al.2015|The xBR $$^+ $$-tree: an efficient access method for points
Reis et al.2018|An evaluation of data model for NoSQL document-based databases
Petrov2018|Algorithms behind modern storage systems
US10235422B2|2019-03-19|Lock-free parallel dictionary encoding
Palovska2015|What Can NoSQL Serve an Enterprise.
Przyjaciel-Zablocki et al.2015|TriAL-QL: distributed processing of navigational queries
Kim et al.2016|Optimally leveraging density and locality to support limit queries
Tung et al.2016|An improved indexing method for Xpath queries
Petrov2018|Algorithms behind modern storage systems: Different uses for read-optimized B-trees and write-optimized LSM-trees
Mazumdar et al.2016|An index scheme for fast data stream to distributed append-only store
Barbucha2013|K-depth RDF keyword search algorithm based on structure indexing
Sharrma et al.2018|Implementing and evaluating r-tree techniques on concurrency control and recovery with modifications on nonspatial domains
US20220019784A1|2022-01-20|Probabilistic text index for semi-structured data in columnar analytics storage formats
同族专利:
公开号 | 公开日
US9830319B1|2017-11-28|
ES2668723B1|2019-04-01|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
US20120246190A1|2011-03-23|2012-09-27|Manik Surtani|System and method for performing object relational mapping for a data grid|
CA2860322A1|2011-12-23|2013-06-27|Amiato, Inc.|Scalable analysis platform for semi-structured data|
US20160246861A1|2012-07-26|2016-08-25|Mongodb, Inc.|Aggregation framework system architecture and method|
US9418085B1|2013-03-13|2016-08-16|Amazon Technologies, Inc.|Automatic table schema generation|
US20160275201A1|2015-03-18|2016-09-22|Adp, Llc|Database structure for distributed key-value pair, document and graph models|
US7590620B1|2004-06-18|2009-09-15|Google Inc.|System and method for analyzing data records|CN108776587B|2018-05-25|2020-07-17|平安科技(深圳)有限公司|Data acquisition method and device, computer equipment and storage medium|
CN112307721A|2020-10-30|2021-02-02|广州朗国电子科技有限公司|Method for quickly converting third-party interface data into customized form and storage medium|
法律状态:
2019-04-01| FG2A| Definitive protection|Ref document number: 2668723 Country of ref document: ES Kind code of ref document: B1 Effective date: 20190401 |
优先权:
申请号 | 申请日 | 专利标题
ES201631481A|ES2668723B1|2016-11-18|2016-11-18|MACHINE OF EXTRACTION, MAPPING AND STORAGE OF HIERARCHICAL DATA|ES201631481A| ES2668723B1|2016-11-18|2016-11-18|MACHINE OF EXTRACTION, MAPPING AND STORAGE OF HIERARCHICAL DATA|
US15/585,929| US9830319B1|2016-11-18|2017-05-03|Hierarchical data extraction mapping and storage machine|
[返回顶部]